In this data project we will focus on exploratory data analysis of stock prices. Keep in mind, this project is just meant to practice your visualization and pandas skills, it is not meant to be a robust financial analysis or be taken as financial advice.
NOTE: This project is extremely challenging because it will introduce a lot of new concepts and have you looking things up on your own (we'll point you in the right direction) to try to solve the tasks issued. Feel free to just go through the solutions lecture notebook and video as a "walkthrough" project if you don't want to have to look things up yourself. You'll still learn a lot that way!
We'll focus on bank stocks and see how they progressed throughout the financial crisis all the way to early 2016.
In this section we will learn how to use pandas to directly read data from Google finance using pandas!
First we need to start with the proper imports, which we've already laid out for you here.
Note: You'll need to install pandas-datareader for this to work! Pandas datareader allows you to read stock information directly from the internet Use these links for install guidance (pip install pandas-datareader), or just follow along with the video lecture.
Already filled out for you.
In [1]:
from pandas_datareader import data, wb
import pandas as pd
import numpy as np
import datetime
%matplotlib inline
We need to get data using pandas datareader. We will get stock information for the following banks:
Figure out how to get the stock data from Jan 1st 2006 to Jan 1st 2016 for each of these banks. Set each bank to be a separate dataframe, with the variable name for that bank being its ticker symbol. This will involve a few steps:
Use this documentation page for hints and instructions (it should just be a matter of replacing certain values. Use google finance as a source, for example:
# Bank of America
BAC = data.DataReader("BAC", 'google', start, end)
In [2]:
start = datetime.datetime(2006, 1, 1)
end = datetime.datetime(2016, 1, 1)
In [3]:
# Bank of America
BAC = data.DataReader("BAC", 'google', start, end)
# CitiGroup
C = data.DataReader("C", 'google', start, end)
# Goldman Sachs
GS = data.DataReader("GS", 'google', start, end)
# JPMorgan Chase
JPM = data.DataReader("JPM", 'google', start, end)
# Morgan Stanley
MS = data.DataReader("MS", 'google', start, end)
# Wells Fargo
WFC = data.DataReader("WFC", 'google', start, end)
In [4]:
# Could also do this for a Panel Object
df = data.DataReader(['BAC', 'C', 'GS', 'JPM', 'MS', 'WFC'],'google', start, end)
Create a list of the ticker symbols (as strings) in alphabetical order. Call this list: tickers
In [5]:
tickers = ['BAC', 'C', 'GS', 'JPM', 'MS', 'WFC']
Use pd.concat to concatenate the bank dataframes together to a single data frame called bank_stocks. Set the keys argument equal to the tickers list. Also pay attention to what axis you concatenate on.
In [6]:
bank_stocks = pd.concat([BAC, C, GS, JPM, MS, WFC],axis=1,keys=tickers)
Set the column name levels (this is filled out for you):
In [7]:
bank_stocks.columns.names = ['Bank Ticker','Stock Info']
Check the head of the bank_stocks dataframe.
In [8]:
bank_stocks.head()
Out[8]:
Let's explore the data a bit! Before continuing, I encourage you to check out the documentation on Multi-Level Indexing and Using .xs. Reference the solutions if you can not figure out how to use .xs(), since that will be a major part of this project.
What is the max Close price for each bank's stock throughout the time period?
In [9]:
bank_stocks.xs(key='Close',axis=1,level='Stock Info').max()
Out[9]:
Create a new empty DataFrame called returns. This dataframe will contain the returns for each bank's stock. returns are typically defined by:
$$r_t = \frac{p_t - p_{t-1}}{p_{t-1}} = \frac{p_t}{p_{t-1}} - 1$$
In [10]:
returns = pd.DataFrame()
We can use pandas pct_change() method on the Close column to create a column representing this return value. Create a for loop that goes and for each Bank Stock Ticker creates this returns column and set's it as a column in the returns DataFrame.
In [11]:
for tick in tickers:
returns[tick+' Return'] = bank_stocks[tick]['Close'].pct_change()
returns.head()
Out[11]:
Create a pairplot using seaborn of the returns dataframe. What stock stands out to you? Can you figure out why?
In [13]:
#returns[1:]
import seaborn as sns
sns.pairplot(returns[1:])
Out[13]:
Background on Citigroup's Stock Crash available here.
You'll also see the enormous crash in value if you take a look a the stock price plot (which we do later in the visualizations.)
Using this returns DataFrame, figure out on what dates each bank stock had the best and worst single day returns. You should notice that 4 of the banks share the same day for the worst drop, did anything significant happen that day?
In [14]:
# Worst Drop (4 of them on Inauguration day)
returns.idxmin()
Out[14]:
You should have noticed that Citigroup's largest drop and biggest gain were very close to one another, did anythign significant happen in that time frame?
In [15]:
# Best Single Day Gain
# citigroup stock split in May 2011, but also JPM day after inauguration.
returns.idxmax()
Out[15]:
Take a look at the standard deviation of the returns, which stock would you classify as the riskiest over the entire time period? Which would you classify as the riskiest for the year 2015?
In [16]:
returns.std() # Citigroup riskiest
Out[16]:
In [17]:
returns.ix['2015-01-01':'2015-12-31'].std() # Very similar risk profiles, but Morgan Stanley or BofA
Out[17]:
Create a distplot using seaborn of the 2015 returns for Morgan Stanley
In [18]:
sns.distplot(returns.ix['2015-01-01':'2015-12-31']['MS Return'],color='green',bins=100)
Out[18]:
Create a distplot using seaborn of the 2008 returns for CitiGroup
In [19]:
sns.distplot(returns.ix['2008-01-01':'2008-12-31']['C Return'],color='red',bins=100)
Out[19]:
In [20]:
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
%matplotlib inline
# Optional Plotly Method Imports
import plotly
import cufflinks as cf
cf.go_offline()
Create a line plot showing Close price for each bank for the entire index of time. (Hint: Try using a for loop, or use .xs to get a cross section of the data.)
In [21]:
for tick in tickers:
bank_stocks[tick]['Close'].plot(figsize=(12,4),label=tick)
plt.legend()
Out[21]:
In [22]:
bank_stocks.xs(key='Close',axis=1,level='Stock Info').plot()
Out[22]:
In [23]:
# plotly
bank_stocks.xs(key='Close',axis=1,level='Stock Info').iplot()
In [24]:
plt.figure(figsize=(12,6))
BAC['Close'].ix['2008-01-01':'2009-01-01'].rolling(window=30).mean().plot(label='30 Day Avg')
BAC['Close'].ix['2008-01-01':'2009-01-01'].plot(label='BAC CLOSE')
plt.legend()
Out[24]:
Create a heatmap of the correlation between the stocks Close Price.
In [25]:
sns.heatmap(bank_stocks.xs(key='Close',axis=1,level='Stock Info').corr(),annot=True)
Out[25]:
Optional: Use seaborn's clustermap to cluster the correlations together:
In [26]:
sns.clustermap(bank_stocks.xs(key='Close',axis=1,level='Stock Info').corr(),annot=True)
Out[26]:
In [27]:
close_corr = bank_stocks.xs(key='Close',axis=1,level='Stock Info').corr()
close_corr.iplot(kind='heatmap',colorscale='rdylbu')
Use .iplot(kind='candle) to create a candle plot of Bank of America's stock from Jan 1st 2015 to Jan 1st 2016.
In [28]:
BAC[['Open', 'High', 'Low', 'Close']].ix['2015-01-01':'2016-01-01'].iplot(kind='candle')
Use .ta_plot(study='sma') to create a Simple Moving Averages plot of Morgan Stanley for the year 2015.
In [29]:
MS['Close'].ix['2015-01-01':'2016-01-01'].ta_plot(study='sma',periods=[13,21,55],title='Simple Moving Averages')
Use .ta_plot(study='boll') to create a Bollinger Band Plot for Bank of America for the year 2015.
In [30]:
BAC['Close'].ix['2015-01-01':'2016-01-01'].ta_plot(study='boll')